Of P-values and Bayes: a modest proposal.

نویسنده

  • S N Goodman
چکیده

I am delighted to be invited to comment on the use of P-values, but at the same time, it depresses me. Why? So much brainpower, ink, and passion have been expended on this subject for so long, yet plus ca change, plus c’ést le meme chose – the more things change, the more they stay the same. The references on this topic encompass innumerable disciplines, going back almost to the moment that P-values were introduced (by R.A. Fisher in the 1920s). The introduction of hypothesis testing in 1933 precipitated more intense engagement, caused by the subsuming of Fisher’s “significance test” into the hypothesis test machinery.1–9 The discussion has continued ever since. I have been foolish enough to think I could whistle into this hurricane and be heard.10–12 But we (and I) still use P-values. And when a journal like EPIDEMIOLOGY takes a principled stand against them,13 epidemiologists who may recognize the limitations of Pvalues still feel as if they are being forced to walk on one leg.14 So why do those of us who criticize the use of P-values bother to continue doing so? Isn’t the “real world” telling us something – that we are wrong, that the effort is quixotic, or that this is too trivial an issue for epidemiologists to spend time on? Admittedly, this is not the most pressing methodologic issue facing epidemiologists. Still, I will try to argue that the topic is worthy of serious consideration. Let me begin with an observation. When epidemiologists informally communicate their results (in talks, meeting presentations, or policy discussions), the balance between biology, methodology, data, and context is often appropriate. There is an emphasis on presenting a coherent epidemiologic or pathophysiologic “story,” with comparatively little talk of statistical “rejection” or other related tomfoolery. But this same sensibility is often not reflected in published papers. Here, the structure of presentation is more rigid, and statistical summaries seem to have more power. Within these confines, the narrative flow becomes secondary to the distillation of complex data, and inferences seem to flow from the data almost automatically. It is this automaticity of inference that is most distressing, and for which the elimination of P-values has been attempted as a curative. Although I applaud the motivation of attempts to eliminate P-values, they have failed in the past and I predict that they will continue to fail. This is because they treat the symptoms and not the underlying mindset, which must be our target. We must change how we think about science itself. I and others have discussed the connections between statistics and scientific philosophy elsewhere,11,12,15–22 so I will cut to the chase here. The root cause of our problem is a philosophy of scientific inference that is supported by the statistical methodology in dominant use. This philosophy might best be described as a form of “naïve inductivism,”23 a belief that all scientists seeing the same data should come to the same conclusions. By implication, anyone who draws a different conclusion must be doing so for nonscientific reasons. It takes as given the statistical models we impose on data, and treats the estimated parameters of such models as direct mirrors of reality rather than as highly filtered and potentially distorted views. It is a belief that scientific reasoning requires little more than statistical model fitting, or in our case, reporting odds ratios, P-values and the like, to arrive at the truth. How is this philosophy manifest in research reports? One merely has to look at their organization. Traditionally, the findings of a paper are stated at the beginning of the discussion section. It is as if the finding is something derived directly from the results section. Reasoning and external facts come afterward, if at all. That is, in essence, naïve inductivism. This view of the scientific enterprise is aided and abetted by the P-value in a variety of ways, some obvious, some subtle. The obvious way is in its role in the reject/accept hypothesis test machinery. The more subtle way is in the fact that the P-value is a probability – something absolute, with nothing external needed for its interpretation. Now let us imagine another world – a world in which we use an inferential index that does not tell us where we stand, but how much distance we have covered. Imagine a number that does not tell us what we know, but how much we have learned. Such a number could lead us to think very differently about the role of data in making inferences, and in turn lead us to write about our data in a profoundly different manner. This is not an imaginary world; such a number exists. It is called the Bayes factor.15,17,25 It is the data compoDepartment of Oncology, Division of Biostatistics, Johns Hopkins School of Medicine, Baltimore, MD.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Health Services Research Spending and Healthcare System Impact; Comment on “Public Spending on Health Service and Policy Research in Canada, the United Kingdom, and the United States: A Modest Proposal”

The challenges associated with translating health services and policy research (HSPR) evidence into practice are many and long-standing. Indeed, those challenges have themselves spawned new areas of research, including knowledge translation and implementation science. These sub-disciplines have increased our understanding of the critical success factors associated with the uptake of research ev...

متن کامل

Making Research Matter; Comment on “Public Spending on Health Service and Policy Research in Canada, the United Kingdom, and the United States: A Modest Proposal”

We offer a UK-based commentary on the recent “Perspective” published in IJHPM by Thakkar and Sullivan. We are sympathetic to the authors’ call for increased funding for health service and policy research (HSPR). However, we point out that increasing that investment – in any of the three countries they compare: Canada, the United States and the United Kingdom– will ipso facto not necessarily lea...

متن کامل

Public Spending on Health Service and Policy Research in Canada, the United Kingdom, and the United States: A Modest Proposal

Health services and policy research (HSPR) represent a multidisciplinary field which integrates knowledge from health economics, health policy, health technology assessment, epidemiology, political science among other fields, to evaluate decisions in health service delivery. Health service decisions are informed by evidence at the clinical, organizational, and policy level, levels with distinct...

متن کامل

Public Spending on Health Services and Policy Research in Canada: A Reflection on Thakkar and Sullivan; Comment on “Public Spending on Health Service and Policy Research in Canada, the United Kingdom, and the United States: A Modest Proposal”

Vidhi Thakkar and Terrence Sullivan have done a careful and thought-provoking job in trying to establish comparable estimates of public spending on health services and policy research (HSPR) in Canada, the United Kingdom and the United States. Their main recommendation is a call for an international collaboration to develop common terms and categories of HSPR. This paper raises two additional q...

متن کامل

BAYES PREDICTION INTERVALS FOR THE BURR TYPE XI1 DISTRIBUTION IN THE PRESENCE OF OUTLIERS

Using a sample fiom Burr type XU distribution, Bayes prediction intervals are derived for the maximum and minimum of a future sample fromthe same distribution, but in the presence of a single outlier of the type 8,8. The prior of Q is assumed to be the gamma conjugate. A real example is given to illustrate the procedure. Also, the comparison between the values of the prediction bounds for dif...

متن کامل

Comparison of Single and Multi-Step Bayesian Methods for Predicting Genomic Breeding Values in Genotyped and Non-Genotyped Animals- A Simulation Study

     The purpose of this study was to compare the accuracy of genomic evaluation for Bayes A, Bayes B, Bayes C and Bayes L multi-step methods and SSBR-C and SSBR-A single-step methods in the different values of π for predicting genomic breeding values of the genotyped and non-genotyped animals. A genome with 40000 SNPs on the 20 chromosom was simulated with the same distance (100cM). The π valu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Epidemiology

دوره 12 3  شماره 

صفحات  -

تاریخ انتشار 2001